随着计算机图形技术的开发,计算机软件合成的图像越来越接近照片。尽管计算机图形技术为我们带来了游戏和电影领域中的盛大视觉盛宴,但它也可以被不良意愿的人使用来指导公众意见并造成政治危机或社会动荡。因此,如何将计算机生成的图形(CG)与照片(PG)区分开已成为数字图像取证领域的重要主题。本文提出了基于通道关节和软杆的双流卷积神经网络。所提出的网络体系结构包括一个用于提取图像噪声信息的残差模块和一个联合通道信息提取模块,用于捕获图像的浅色语义信息。此外,我们还设计了一个残留结构,以增强特征提取并减少剩余流中信息的损失。联合通道信息提取模块可以获取输入图像的浅语义信息,该信息可以用作残差模块的信息补充块。整个网络使用Softpool来减少图像下采样的信息丢失。最后,我们融合了两个流以获得分类结果。 SPL2018和DSTOK上的实验表明,所提出的方法优于现有方法,尤其是在DSTOK数据集上。例如,我们的模型的性能超过了最先进的3%。
translated by 谷歌翻译
句子完成(SC)问题提出了一个或多个需要填写的空白,三到五个可能的单词或短语作为选项。SC问题被广泛用于学习英语作为第二语言(ESL)的学生。在本文中,我们提出了一个大规模的SC数据集,\ textsc {sc-ques},该数据由292,517 ESL SC的问题组成,来自现实世界中标准化英语考试。此外,我们通过在提出的\ textsc {sc-ques}数据集上训练大规模的预训练语言模型来自动解决SC问题的全面基准。我们对基线模型的性能,限制和权衡进行详细分析。数据和我们的代码可用于研究目的:\ url {https://github.com/ai4ed/sc-ques}。
translated by 谷歌翻译
在线对话说明是在现实世界在线教育环境中使用的一系列教学说明,以激励学生,帮助了解学习材料并建立有效的学习习惯。尽管在线学习的受欢迎程度和优势,但教育技术和教育数据挖掘社区仍然缺乏缺乏大规模,高质量和良好的教学教学指导数据集来研究计算方法,以自动检测在线对话说明并进一步提高在线教学效果。因此,在本文中,我们提供了一个在线对话说明检测的数据集\ textsc {dialogId},其中包含30,431个有效的对话说明。这些教学说明很好地注释分为8个类别。此外,我们还利用了普遍的预训练的语言模型(PLM),并提出一个简单而有效的对抗训练学习范式来提高对话指导检测的质量和概括。广泛的实验表明,我们的方法的表现优于多种基线方法。数据和我们的代码可用于研究目的:\ url {https://github.com/ai4ed/dialogid}。
translated by 谷歌翻译
我们提出了一种简单但有效的方法,建议为学生提供高质量和多样性的练习。我们的方法由三个关键组成部分组成:(1)候选生成模块;(2)促进多样性的模块;(3)范围限制模块。提出的方法在召回方面提高了总体建议性能,与基线相比,推荐候选者的多样性增加了0.81 \%。
translated by 谷歌翻译
知识跟踪(KT)是使用学生的历史学习互动数据来对其知识掌握的任务,以便对他们未来的互动绩效进行预测。最近,使用各种深度学习技术来解决KT问题已经取得了显着的进步。但是,基于深度学习的知识追踪(DLKT)方法的成功仍然有些神秘,适当的测量以及对这些DLKT方法的分析仍然是一个挑战。首先,现有作品中的数据预处理程序通常是私人和/或自定义,这限制了实验标准化。此外,现有的DLKT研究通常在评估方案方面有所不同,并且是现实世界中的教育环境。为了解决这些问题,我们介绍了一个综合基于Python的基准平台\ TextSc {Pykt},以确保通过彻底评估进行跨DLKT方法的有效比较。 \ textsc {pykt}库由不同域的7个流行数据集上的一组标准化的数据预处理程序组成,而10个经常比较了用于透明实验的DLKT模型实现。我们细粒度和严格的经验KT研究的结果产生了一系列观察结果和有效DLKT的建议,例如,错误的评估设置可能会导致标签泄漏,这通常会导致性能膨胀;与Piech等人提出的第一个DLKT模型相比,许多DLKT方法的改进是最小的。 \ cite {piech2015 -Deep}。我们已经开源\ textsc {pykt},并在\ url {https://pykt.org/}上进行了实验结果。我们欢迎其他研究小组和从业人员的贡献。
translated by 谷歌翻译
到目前为止,大多数现有的STEG共分类方法都是为灰度图像设计的,并且它们不适合广泛用于当前社交网络的彩色图像。在本文中,我们在空间和JPEG域中设计了一个通用彩色图像隐星分析网络(名为UCNet)。该方法包括预处理,卷积和分类模块。为了在预处理模块中保留每个颜色通道中的隐写作道,我们首先将输入图像分为三个通道,根据相应的嵌入空间(即用于空间隐写术和JPEG隐写术的YCBCR的RGB),然后用62提取图像残差固定的高通滤波器,最后连接所有截断的残差进行后续分析,而不是将它们与正常卷积一起添加,如现有的基于CNN为基于CNN的steganalyzers。为了加速网络融合并有效地减少参数的数量,在卷积模块中,我们仔细设计了三种类型的层,具有不同的快捷方式连接和组卷积结构,以进一步学习高级落地特征。在分类模块中,我们采用全局平均池和完全连接的分类层进行分类。我们对阿拉斯加II进行了广泛的实验,以证明该方法可以在空间和JPEG结构域中的现代CNN的基于CNN的斯托格莱克(例如,SRNET和J-YENET)相比,可以实现最先进的结果,同时保持相对很少的内存要求和培训时间。此外,我们还提供必要的描述和许多消融实验,以验证网络设计的合理性。
translated by 谷歌翻译
Weakly-supervised object localization aims to indicate the category as well as the scope of an object in an image given only the image-level labels. Most of the existing works are based on Class Activation Mapping (CAM) and endeavor to enlarge the discriminative area inside the activation map to perceive the whole object, yet ignore the co-occurrence confounder of the object and context (e.g., fish and water), which makes the model inspection hard to distinguish object boundaries. Besides, the use of CAM also brings a dilemma problem that the classification and localization always suffer from a performance gap and can not reach their highest accuracy simultaneously. In this paper, we propose a casual knowledge distillation method, dubbed KD-CI-CAM, to address these two under-explored issues in one go. More specifically, we tackle the co-occurrence context confounder problem via causal intervention (CI), which explores the causalities among image features, contexts, and categories to eliminate the biased object-context entanglement in the class activation maps. Based on the de-biased object feature, we additionally propose a multi-teacher causal distillation framework to balance the absorption of classification knowledge and localization knowledge during model training. Extensive experiments on several benchmarks demonstrate the effectiveness of KD-CI-CAM in learning clear object boundaries from confounding contexts and addressing the dilemma problem between classification and localization performance.
translated by 谷歌翻译
In this paper, a semantic communication framework for image transmission is developed. In the investigated framework, a set of servers cooperatively transmit images to a set of users utilizing semantic communication techniques. To evaluate the performance of studied semantic communication system, a multimodal metric is proposed to measure the correlation between the extracted semantic information and the original image. To meet the ISS requirement of each user, each server must jointly determine the semantic information to be transmitted and the resource blocks (RBs) used for semantic information transmission. We formulate this problem as an optimization problem aiming to minimize each server's transmission latency while reaching the ISS requirement. To solve this problem, a value decomposition based entropy-maximized multi-agent reinforcement learning (RL) is proposed, which enables servers to coordinate for training and execute RB allocation in a distributed manner to approach to a globally optimal performance with less training iterations. Compared to traditional multi-agent RL, the proposed RL improves the valuable action exploration of servers and the probability of finding a globally optimal RB allocation policy based on local observation. Simulation results show that the proposed algorithm can reduce the transmission delay by up to 16.1% compared to traditional multi-agent RL.
translated by 谷歌翻译
New architecture GPUs like A100 are now equipped with multi-instance GPU (MIG) technology, which allows the GPU to be partitioned into multiple small, isolated instances. This technology provides more flexibility for users to support both deep learning training and inference workloads, but efficiently utilizing it can still be challenging. The vision of this paper is to provide a more comprehensive and practical benchmark study for MIG in order to eliminate the need for tedious manual benchmarking and tuning efforts. To achieve this vision, the paper presents MIGPerf, an open-source tool that streamlines the benchmark study for MIG. Using MIGPerf, the authors conduct a series of experiments, including deep learning training and inference characterization on MIG, GPU sharing characterization, and framework compatibility with MIG. The results of these experiments provide new insights and guidance for users to effectively employ MIG, and lay the foundation for further research on the orchestration of hybrid training and inference workloads on MIGs. The code and results are released on https://github.com/MLSysOps/MIGProfiler. This work is still in progress and more results will be published soon.
translated by 谷歌翻译
With the development of technology and sharing economy, Airbnb as a famous short-term rental platform, has become the first choice for many young people to select. The issue of Airbnb's pricing has always been a problem worth studying. While the previous studies achieve promising results, there are exists deficiencies to solve. Such as, (1) the feature attributes of rental are not rich enough; (2) the research on rental text information is not deep enough; (3) there are few studies on predicting the rental price combined with the point of interest(POI) around the house. To address the above challenges, we proposes a multi-source information embedding(MSIE) model to predict the rental price of Airbnb. Specifically, we first selects the statistical feature to embed the original rental data. Secondly, we generates the word feature vector and emotional score combination of three different text information to form the text feature embedding. Thirdly, we uses the points of interest(POI) around the rental house information generates a variety of spatial network graphs, and learns the embedding of the network to obtain the spatial feature embedding. Finally, this paper combines the three modules into multi source rental representations, and uses the constructed fully connected neural network to predict the price. The analysis of the experimental results shows the effectiveness of our proposed model.
translated by 谷歌翻译